Bioinformatics and system biology approach to identify potential common pathogenesis for COVID-19 infection and osteoarthritis

A growing of evidence has showed that patients with osteoarthritis (OA) had a higher coronavirus 2019 (COVID-19) infection rate and a poorer prognosis after infected it. Additionally, scientists have also discovered that COVID-19 infection might cause pathological changes in the musculoskeletal system. However, its mechanism is still not fully elucidated. This study aims to further explore the sharing pathogenesis of patients with both OA and COVID-19 infection and find candidate drugs. Gene expression profiles of OA (GSE51588) and COVID-19 (GSE147507) were obtained from the Gene Expression Omnibus (GEO) database. The common differentially expressed genes (DEGs) for both OA and COVID-19 were identified and several hub genes were extracted from them. Then gene and pathway enrichment analysis of the DEGs were performed; protein–protein interaction (PPI) network, transcription factor (TF)-gene regulatory network, TF-miRNA regulatory network and gene-disease association network were constructed based on the DEGs and hub genes. Finally, we predicted several candidate molecular drugs related to hub genes using DSigDB database. The receiver operating characteristic curve (ROC) was applied to evaluate the accuracy of hub genes in the diagnosis of both OA and COVID-19. In total, 83 overlapping DEGs were identified and selected for subsequent analyses. CXCR4, EGR2, ENO1, FASN, GATA6, HIST1H3H, HIST1H4H, HIST1H4I, HIST1H4K, MTHFD2, PDK1, TUBA4A, TUBB1 and TUBB3 were screened out as hub genes, and some showed preferable values as diagnostic markers for both OA and COVID-19. Several candidate molecular drugs, which are related to the hug genes, were identified. These sharing pathways and hub genes may provide new ideas for further mechanistic studies and guide more individual-based effective treatments for OA patients with COVID-19 infection.

Gene ontology (GO) and pathway enrichment analysis. GO (biological processes, cellular component, and molecular functions) and pathway enrichment analysis (WikiPathways, Reactome, BioCarta, and Kyoto Encyclopedia of Genes and Genomes (KEGG)) were conducted using EnrichR online tool (https:// maaya nlab. cloud/ Enric hr/) to specify the shared function and pathways between COVID-19 and OA 27 . The P value < 0.05 was considered significantly enriched. www.nature.com/scientificreports/ PPI network analysis. STRING (http:// string-db. org) (version 11.5) is a database for the study of protein-protein association networks with increased information coverage on more than 14,000 species, 67 million proteins and 20 billion interactions, supporting functional discovery in genome-wide experimental datasets 28 . We constructed the PPI network of proteins derived from shared DEGs utilizing the STRING repository with an interaction score > 0. 15. Identification and analysis of hub genes. In the PPI network which consists of nodes, edges and their connections, nodes with the most entanglement are considered as hub genes. Cytohubba (http:// apps. cytos cape. org/ apps/ cytoh ubba) is a novel plugin of Cytoscape for extracting central elements of a biological network 29 . The hub genes were selected by applying seven algorithms (Closeness, MCC, Degree, MNC, Radiality, Stress and EPC) and intersecting them. GeneMANIA (https:// genem ania. org) 30 , a flexible user-friendly web site for analyzing gene function, was utilized to construct a co-expression network of identified hub genes.
Construction of TF-gene and miRNA-gene regulatory network. TFs govern gene transcription information and miRNAs post-transcriptionally control gene expression; hence, their activity is essential for gaining molecular insights 31,32 . NetworkAnalyst is a broad online platform for statistics, visualization, and metaanalysis of web-based gene expression data 33 . JASPAR (http:// jaspar. gener eg. net) is a public resource, provides across six classification group of TF combining spectrum of multiple species 34 . MirTarbase is a tool which helps researchers filter top miRNAs and detect biological functions and features that facilitate the development of biological hypotheses 35 . We topologically located credible TFs which tend to bind to our hub genes from the JASPAR database on the networkAnalyst platform. Then we extracted miRNAs that interacted with our hub genes from mirTarbase via networkAnalyst.

Gene-disease association analysis.
DisGeNET is a comprehensive platform that integrates information on genes and variants associated with human diseases and can be used to investigate the molecular basis of specific human diseases and their comorbidities 36 . We also examined the gene-disease relationship using DisGeNET database via NetworkAnalyst to disclose associated diseases and their complications with hub genes.
Evaluation of applicant drugs. Drug Signatures database (DSigDB) containing 22,527 gene sets was used to generate the small molecules which could downregulate the expression of hub genes 37 . The access to the DSigDB database is acquired through Enrichr (https:// amp. pharm. mssm. edu/ Enric hr/) platform 38 . Drug molecules were identified using the DSigDB via Enrichr based on the selected hub genes.
ROC curves of hub genes. The receiver operating characteristic (ROC) curve was drawn and the area under the curve (AUC) of the ROC curve was calculated using the pROC package in R in order to detect the diagnostic ability of all candidate hub genes, respectively 39 .

Results
Identification of DEGs of COVID-19 and OA. The overall flow chart of this study was shown in Fig. 1.
3569 DEGs were identified in the GSE147507 dataset, and 861 DEGs were figured out in the GSE51588 dataset. Figures 2 and 3 displayed the distribution of DEGs between COVID-19 and non-COVID-19 patients, and between OA patients and normal controls through a heatmap and a volcano plot, respectively. The intersection of DEGs of GSE147507 and GSE51588 datasets was visualized by Venn diagrams, and there were 83 shared DEGs selected (Fig. 4).
Gene ontology and pathway enrichment analysis. The most significant terms in the biological process, molecular functions, and cellular components category of GO were positive regulation of muscle contraction, tertiary granule lumen and phosphate iron binding, respectively (Fig. 5). The enriched pathways of the common DEGs between COVID-19 and OA were gathered from four global databases, including KEGG, WikiPathways, Reactome, and BioCarta, and visualized in Fig. 6. PPI network analysis. We carefully studied and visualized the PPI network in STRING to predict the interaction and adhesion pathways of common DEGs. The PPI network of common DEGs consisted of 81 nodes and 273 edges, where the PPI enrichment P value was lower than 1.75E−08, as shown in Fig. 7.
Based on GeneMANIA database, we constructed a complex gene interaction network to decipher the biological functions of these hub genes ( Fig. 9), with the co-expression of 63.34%, shared protein domains (17.75%), physical interactions of 17.58%, pathway of 0.75% and co-localization of 0.57%. The results also demonstrated that they were mainly related to the nucleoside binding, purine nucleoside binding, regulation of vacuole organization, purine ribonucleoside binding, guanyl nucleotide binding. These hub genes can be potential biomarkers, which may also lead to new therapeutic strategies for investigated diseases. Identification of disease association. Treatment design strategies for disease open the door to reveal the relationship between genes and diseases. Through the analysis of the gene-disease association by Networkanalyst, we found that arthritis, mammary neoplasms, liver cirrhosis, anemia, autistic disorder, schizophrenia, autosomal recessive predisposition, mental depression, hypertensive disease, bipolar disorder, constipation and cardiac arrhythmia are most related to our hub genes. The association between gene-disease is displayed in Fig. 12.
Identification of candidate drugs. Small molecule drugs regulating the expression of hub genes were collected from the DSigDB database on Enrichr platform. The results from the potential small molecules were generated based on their P values to represent the closeness between the small molecules and genes. Table 1 and Fig. 13 pointed out the top 10 potential small molecule drugs for hub genes.
ROC curves of hub genes. The ROC curve was plotted to evaluate the diagnostic efficacy of 14 key genes (Fig. 14).

Discussion
There is evidence that patients with osteoarthritis (OA) have a higher prevalence and worse prognosis after COVID-19 infection 11,40 . COVID-19 infection can cause pathological changes in multiple organs, including the musculoskeletal system 18,41 . Some relevant researches suggested that some inflammation and immune response might involve in it 14 . Therefore, we attempted to explore the shared function and pathways between COVID-19 and OA, and to determine the interrelationship between COVID-19 and OA. In this study, 83 shared DEGs and 14 hub genes (CXCR4, EGR2, ENO1, FASN, GATA6, HIST1H3H,  HIST1H4H, HIST1H4I, HIST1H4K, MTHFD2, PDK1, TUBA4A, TUBB1 and TUBB3) have been identified.
Among them, six genes (CXCR4, ENO1, FASN, GATA6, PDK1 and TUBB3) have been reported to be associated with the pathological mechanism of COVID-19 and OA. CXCR4 and PDK1 involved in the pathogenesis of both COVID-19 and OA. CXCR4, shorten for CXC chemokine receptor 4, is a G-protein-coupled receptors (GPCR) which can activate a variety of downstream signaling pathways 42 . Scientists has found that CXCR4 was highly expressed in COVID-19 patients 43 . Reasons are probably that CXCR4-positive pre-neutrophils (preNeu) may be released prematurely from bone marrow into blood and infiltrate lung tissue in severe patients 44 . While in OA tissue, SDF-1/CXCR4 axis has been reported to coordinate the communication between subchondral bone and articular cartilage, which is considered to be a central feature of OA occurrence and development 45 . PDK1, Phosphoinositide dependent protein kinase-1, has been found to be associated with apoptosis 46 . In OA, researches demonstrated that PDK1 could promote apoptosis of chondrocytes via modulating MAPK pathway, and has been identified as hub genes of hypermethylation low-expression genes 46,47 . Another investigation revealed that SARS-CoV-2-encoded nucleocapsid protein N could specifically enhance the M-induced apoptosis via interacting with both M and PDK1, thereby enhancing M-mediated attenuation of PDK1-PKB/Akt interactions 48 . ENO1 and TUBB3 were found to be probably related to the progression of OA, while FASN and GATA6 may participate in COVID-19. α-Enolase (ENO1) is an enolase isoform widely expressed in almost all mammal cells, characterized as a key glycolytic enzyme and an RNA-binding protein 49 . In inflammatory arthritis, enhanced surface expression of ENO1 in patient-derived peripheral blood mononuclear cells promotes inflammatory response 50 . It was also found that ENO1 could promote osteoarthritis progression through interacting with Circular RNA circNFKB1 and sustaining NF-κB signaling 49 . TUBB3 has been confirmed to be the neuron markers induced by the differentiation process in a model which mimicked a pro-nociceptive microenvironment, which could be further investigated in the field of pain in OA. FASN, Fatty acid synthase, is a key cellular enzyme in palmitate synthesis 51 . SARS-CoV-2 was expected to increase production of lipid anabolic enzymes including FASN by increasing PI3K/ AKT/mTOR/S6K signaling activity 52 . Scientists also discovered that inhibition of FASN and restoration of lipid catabolism could impede replication of coronaviruses closely related to SARS-CoV-2 51 . GATA6 is a member of a small family of zinc-finger DNA-binding transcription factors, and plays an important role in the regulation of cellular differentiation 53 . Of note, GATA6 is involved in immune and inflammatory responses by promoting the transcription of SFTPA gene. Scientists discovered that both GATA6 and SFTPA genes were upregulated in SARS-CoV-2-infected lungs 54 . Genome-wide CRISPR screens also identified GATA6 as a pro-viral host factor for SARS-CoV-2 via modulation of ACE2 53 . The remaining 8 key genes (EGR2, HIST1H3H, HIST1H4H, HIST1H4I,  HIST1H4K, MTHFD2, TUBA4A, and TUBB1) were less studied in the roles of COVID-19 and OA, emphasizing their importance in future research.
Enrichment analysis in our study indicated that hypoxia-inducible factor (HIF)-1 signaling pathway is a common pathogenesis of COVID-19 and OA. Since the discovery of HIF-1, numerous studies on the hypoxia signaling pathway have been performed 55 . The role of HIF stabilization during hypoxia has expanded from the induction of a single gene, erythropoietin, to the upregulation of a couple of hundred downstream targets   56 . Moreover, inflammatory and reparative activities of lung macrophages are regulated by β-catenin-HIF-1α signaling, with implications for the treatment of severe respiratory diseases including COVID-19 19 . HIF-1 signaling pathway is essential in the homeostasis of multiple tissues in OA as well 57 . For example, Chen et al. found that HIF-1-VEGF-Notch mediated angiogenesis in temporomandibular joint osteoarthritis 58 . Another in vitro experiment showed that HIF-1α facilitated osteocyte-mediated osteoclastogenesis by activating JAK2/STAT3 pathway 59 . Scientists found that HIF-1α maintained mouse articular cartilage stabilization through suppression of NF-κB signaling 60 , and the interaction of HIF1α and β-catenin inhibited matrix metalloproteinase 13 expression so that preventing cartilage damage 61 . Several chemical agents and drugs have been utilized as potential therapeutic targets against COVID-19 or OA. However, up to now, no drugs were identified to treat individuals with both COVID-19 and OA. In our study, we explored several drugs which could be used as possible targets. Dimethyloxalylglycine is an inhibitor Cephaeline acted by repressing HIF-1α ± and disturbing intracellular redox homeostasis. Cephaeline were discovered to inhibit SARS-CoV-2 with EC50 values of low nanomolar levels potently 63 . Another extracted drug was Cycloheximide, which is an inhibitor of eukaryotic protein synthesis. Cycloheximide inhibits ferroptosis and inhibits autophagy. In clinical trials, cycloheximide showed potent activity against human coronaviruses 64 .
Although previous studies have respectively explored the pivotal genes associated with each COVID-19 or OA, studies exploring the common molecular mechanisms between the two through bioinformatic approaches are still in vacancy. In this study, for the first time, we explored and identified common DEGs and hub genes of both COVID-19 and OA which may help to further elucidate the pathogenesis of both. However, there are also some limitations in our study. Firstly, the data were downloaded from a public database and input errors could not be assessed. Second, in this study we have used two datasets-one is RNA-seq for COVID-19 and the other one is microarray data for OA. However, microarray data has some limitations over RNA-seq data. Furthermore, the sample sizes, age and sex of COVID-19 and OA are not entirely balanced. Last, this study requires external experimental validation to verify our findings and the function of hub genes needs to be further validated in an in vitro model.
Overall, we identified common DEGs and hub genes for both COVID-19 and OA, and performed multiple bioinformatics analysis based on them. COVID-19 and OA were found to share some common pathogenic mechanism that may be mediated by specific pivotal genes. This study provides a potential horizon for further investigation of the molecular mechanisms, finding novel drugs, developing individual-based therapy for patients with both COVID-19 and OA, and paving the road towards the treatment of long-term COVID-19.